Text Recognition

The Best 201 Text Recognition Tools in 2025

Table Transformer Structure Recognition

A Table Transformer model trained on the PubTables1M dataset for extracting table structures from unstructured documents

Text Recognition

Trocr Small Handwritten

TrOCR is a Transformer-based optical character recognition model specifically designed for handwritten text image recognition.

Text Recognition

Table Transformer Structure Recognition V1.1 All

A Transformer-based model for table structure recognition, designed to detect table structures in documents

Text Recognition

Trocr Large Printed

Transformer-based optical character recognition model for single-line printed text recognition

Text Recognition

Texify is an OCR tool specifically designed to convert formula images and text into LaTeX format.

Text Recognition

Trocr Base Printed

TrOCR is a Transformer-based optical character recognition model designed for single-line text image recognition, employing an encoder-decoder architecture

Text Recognition

An optical character recognition tool specifically designed for Japanese text, optimized for Japanese manga scenarios.

Text Recognition

Transformers Japanese

Tiny Random Internvl2

Focus on extracting and converting text information from images into editable text content

Text Recognition

Trocr Large Handwritten

TrOCR is a Transformer-based optical character recognition model specifically designed for handwritten text recognition, fine-tuned on the IAM dataset.

Text Recognition

Trocr Small Printed

TrOCR is a Transformer-based optical character recognition model designed for single-line text image OCR tasks.

Text Recognition

Lilt Roberta En Base

Language-independent Layout Transformer (LiLT) provides a LayoutLM-like model for any language by combining pre-trained RoBERTa (English) with a pre-trained language-independent layout transformer (LiLT).

Text Recognition

CRAFT is a multilingual text detection model primarily used for detecting text regions in images, especially optimized for Persian text detection, but also supports other languages.

Text Recognition Supports Multiple Languages

PP OCRv5 Server Det

PP-OCRv5_server_det is the latest generation of text detection model developed by the PaddleOCR team. It is designed for high-performance application scenarios and supports the detection of text in various scenarios, including handwritten, vertical, rotated, and curved text. It can recognize multiple languages.

Text Recognition Supports Multiple Languages

PP OCRv5 Server Rec

PP-OCRv5_server_rec is the latest generation of text line recognition model developed by the PaddleOCR team, supporting the recognition of multilingual and complex text scenarios.

Text Recognition Supports Multiple Languages

UVDoc is mainly used to perform geometric transformations on text images to correct problems such as distortion, tilt, and perspective distortion of documents in the images, thereby improving the accuracy of subsequent text recognition.

Text Recognition Supports Multiple Languages

Trocr Base Handwritten Hist Swe 2

A historical handwriting recognition model jointly developed by the Swedish National Archives and other institutions, specifically designed for Swedish handwritten texts from 1600-1900.

Text Recognition

Transformers Other

Pix2Text's Mathematical Formula Recognition (MFR) model, trained based on the TrOCR architecture, capable of converting mathematical formula images into LaTeX text representations.

Text Recognition

MGP-STR is a pure vision-based scene text recognition model that achieves efficient OCR through multi-granularity prediction.

Text Recognition

TexTeller is an end-to-end formula recognition model based on the ViT architecture, capable of recognizing mathematical formulas in natural images and converting them into LaTeX format.

Text Recognition

Trocr Large Stage1

TrOCR is a Transformer-based pre-trained model for Optical Character Recognition (OCR) tasks.

Text Recognition

Crnn Base Fa V2

An OCR model specifically designed for Persian language, based on CNN+LSTM architecture, optimized for printed/scanned documents, supporting numeric and special character recognition.

Text Recognition Other

Qari OCR 0.1 VL 2B Instruct

An Arabic OCR model fine-tuned based on Qwen2 VL model, optimized for full-page Arabic text recognition

Text Recognition

Transformers Arabic

Crnn Fa Printed 96 Long

An OCR model optimized for Persian language, based on CNN+LSTM architecture, designed specifically for printed/scanned documents

Text Recognition Other

A Thai and English optical character recognition model fine-tuned from the TrOCR base handwriting model, excelling in processing handwritten text line images

Text Recognition

Transformers Supports Multiple Languages

Comic Interpreter is an automatic transcription generation system capable of recognizing text and image elements in comics and generating corresponding transcriptions.

Text Recognition

Transformers English

Layoutlmv3 Finetuned Funsd

A document understanding model fine-tuned on the FUNSD dataset based on the LayoutLMv3-base model, excelling in token classification tasks for forms and documents

Text Recognition

An OCR model supporting Korean initial sound recognition, using an improved tokenizer to address the traditional TrOCR's shortcomings in Korean initial sound recognition

Text Recognition

Transformers Korean

Olmocr 7B Thai V1

olmOCR is an optical character recognition model fine-tuned based on Qwen2-VL-7B-Instruct. It focuses on converting image content such as PDFs into text and improves the recognition accuracy in specific scenarios through fine-tuning.

Text Recognition

Safetensors Other

Table Transformer Structure Recognition V1.1 Pub

A table transformer model trained on the PubTables1M dataset for table structure recognition in documents.

Text Recognition

Mlcd Vit Bigg Patch14 448

MLCD-ViT-bigG is an advanced Vision Transformer model enhanced with 2D Rotary Position Encoding (RoPE2D), excelling in document understanding and visual question answering tasks.

Text Recognition

Pix2Text's Mathematical Formula Detection (MFD) model for recognizing mathematical formulas in images

Text Recognition Other

Layoutlmv2 Finetuned Funsd

A document understanding model fine-tuned on the FUNSD dataset based on Microsoft's LayoutLMv2

Text Recognition

PP DocLayout Plus L

PP-DocLayout_plus-L is a high-precision document layout area positioning model, trained based on the RT-DETR-L architecture, and supports the detection of 20 common document elements.

Text Recognition Supports Multiple Languages

RT DETR L Wireless Table Cell Det

RT-DETR-L_wireless_table_cell_det is a high-precision table cell detection model designed specifically for table recognition tasks. It can accurately locate and mark each cell area in the table image.

Text Recognition Supports Multiple Languages

RT DETR L Wired Table Cell Det

RT-DETR-L_wired_table_cell_det is a key module in the table recognition task, mainly responsible for locating and marking each cell area in the table image.

Text Recognition Supports Multiple Languages

SLANeXt_wired is a deep learning model for table structure recognition, which can convert non - editable table images into editable table formats (such as HTML).

Text Recognition Supports Multiple Languages

Pix2text Table Rec

A table structure recognition model developed based on Microsoft's Table Transformer for table detection and recognition tasks in documents

Text Recognition

SLANet_plus is a model for table structure recognition that can convert non-editable table images into editable table formats (such as HTML). It plays an important role in the table recognition system and can effectively improve the accuracy and efficiency of table recognition.

Text Recognition Supports Multiple Languages

TextNet is a lightweight and efficient architecture specifically designed for text detection, achieving an excellent balance between detection accuracy and inference speed through three variants.

Text Recognition

PP DocBlockLayout

PP-DocBlockLayout is a document layout block positioning model trained based on RT-DETR-L, which can effectively identify layout regions in various document types.

Text Recognition Supports Multiple Languages

Qari OCR V0.3 VL 2B Instruct

QARI-OCR v0.3 is an optical character recognition vision-language model focused on Arabic structured document understanding. It is built on Qwen2-VL-2B-Instruct and excels at preserving document layout and format.

Text Recognition

Transformers Arabic

PP OCRv4 Server Seal Det

The server-side seal text detection model of PP-OCRv4, with high accuracy, suitable for server deployment, and can effectively solve the problem of seal text detection.

Text Recognition Supports Multiple Languages

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase